# README

Overview  of Programs
---------------------------

The code in this replication package utilizes `R` and `matlab` to produce all tables and figures presented in the manuscript and online appendix. There are four steps in the production of the tables and figures:

- **Step 1: Data preparation.** This step performs preparation of the raw confidential micro-data in `R`. This step requires access to the confidential micro-data discussed below. The replication code for this step is confidential and available to qualified researchers internally from the Statistics of Income division of the Internal Revenue Service (IRS).

- **Step 2: Calculation of descriptive statistics.** This step calculates summaries of the raw confidential micro-data (from the previous step) in `R`. This step also produces
Table 2.1, A.1, and A.2, in addition to Figure B.2. The replication code for this step is available in subdirectory `2-descriptive-statistics/` of our replication package, to be run after having run step 1.

- **Step 3: Micro-data estimation.** This step performs estimation using the confidential micro-data to produce intermediate statistical extracts in `R`. This step requires access to the confidential micro-data produced by steps 1 and 2. The replication code for this step is available in subdirectory `3-estimation/` of our replication package, to be run after having run steps 1 and 2.

- **Step 4: Analyses of statistical extracts.** Given the statistical extracts and process estimates produced by steps 1 to 3, this step performs final analyses and produces all remaining tables and figures in `R`, excluding Table A.9 which is a literature review table. The replication code for this step is available in subdirectory `4-analyses/` of our replication package, to be run after having run steps 1 through 3.

Note: some scripts may use API access to pull in publicly-available data, which may be blocked by a firewall.

Step 1 -- Data preparation
---------------------------

### Overview of this Step

This step performs preparation of the raw confidential micro-data in `R`. This step requires access to the confidential micro-data discussed below. The replication code for this step is confidential and available to qualified researchers internally from the Statistics of Income division of the IRS.

### Data Provinence and Availability

The project relied on US Treasury confidential de-identified micro-data to which access is restricted. We accessed this data through the Joint Statistical Research Program, a program administered by IRS Statistics of Income (SOI) to allow government-academic research partnerships. All data analyses occurred onsite at a secure IRS facility and were supervised and reviewed by SOI staff.

We obtained access to the data by submitting a project proposal in response to a call for proposals posted by the SOI on this [website](https://www.irs.gov/statistics/soi-tax-stats-joint-statistical-research-program). The data may be obtained by submitting a proposal in response to the next call for propsals. Note that it can take some months to gain access to the data once a proposal is approved. The authors will assist with any reasonable replication attempts for two years following publication.

### Statement about Rights and Availability

We certify that the authors of the manuscript have legitimate access to and permission to use the data used in this manuscript. No data can be made publicly available.

### Instructions to Replicators

The data preparation is performed by the `CDW-dmn` replication pack available to internal researchers on the SOI server. It includes a README file which explains how to prepare the software environment and source the data construction script, which is written in `R`. See the Data Availability Statement above for further details on accessing this data. Once access is obtained, email the SOI administrator to request the replication pack. Run time for this step is approximately 40 hours.

Step 2 -- Calculation of descriptive statistics
---------------------------

### Overview of this Step

This step calculates summaries of the raw confidential micro-data (from the previous step) in `R`. This step also produces Table 2.1, A.1, and A.2 in addition to Figure B.2.

### Software Requirements

All calculation in this step is performed using `R` on the SOI server; `R` and its packages are installed by server administrators (it was most recently tested with version 3.5.3). At the time of most recent testing, all required packages were installed on the SOI server. However, if some are no longer available, request that they be installed by the server administrator.

*Packages available from GitHub:*

- `setzler/eventStudy/eventStudy`

### Instructions to Replicators

In order to execute step 2, complete the following tasks in order:

- copy the `code` subdirectory (which contains `2-descriptive-statistics/`) to the home directory on the SOI server (with appropriate permission from the server administration)
- run scripts in `2-descriptive-statistics/` in order (from 01 to 06); run time is approximately 5 hours.

Step 3 -- Micro-data estimation
---------------------------

### Overview of this Step

This step performs estimation using the confidential micro-data to produce intermediate statistical extracts in `R`. This step requires access to the confidential micro-data produced by steps 1 and 2.

### Software Requirements

All estimation in this step is performed using `R` on the SOI server; `R` and its packages are installed by server administrators (it was most recently tested with version 3.5.3). At the time of most recent testing, all required packages were installed on the SOI server. However, if some are no longer available, request that they be installed by the server administrator.

*Packages available from GitHub:*

- `setzler/eventStudy/eventStudy`

### Instructions to Replicators

In order to execute step 3, complete the following tasks in order:

- copy the `code` subdirectory (which contains `3-estimation/`) to the home directory on the SOI server (with appropriate permission from the server administration)
- run scripts in `3-estimation/` in order (from 01 to XX); run time is approximately 30 hours.

Step 4 -- Analyses of statistical extracts
---------------------------

### Overview of this Step

Given the statistical extracts and process estimates produced by steps 1 to 3, this step performs final analyses and produces all remaining tables and figures in `R`, excluding Table A.9 which is a literature review table.

### Software Requirements

Almost all table and figure creation in this step is performed using `R` on the SOI server (a few exceptions include calculations done in `R` and `matlab` locally); `R` and its packages are installed by server administrators (it was most recently tested with version 3.5.3). At the time of most recent testing, all required packages were installed on the SOI server. However, if some are no longer available, request that they be installed by the server administrator.

*Packages available from GitHub:*

- `setzler/eventStudy/eventStudy`

### Instructions to Replicators

In order to execute step 4, complete the following tasks in order:

- copy the `code` subdirectory (which contains `4-analyses/`) to the home directory on the SOI server (with appropriate permission from the server administration)
- run scripts in `4-analyses/` in order (from 01 to 46); run time is approximately 3 hours.



